High-Throughput Function Assignment for Novel Gene Products Using Annotation Clustering
نویسندگان
چکیده
We have designed and implemented a software package for the automatic high-throughput function prediction for genes. This system attempts to assign a biological function to protein sequences by carrying out searches in sequence databanks and by locating functionally relevant motifs in the query sequences. The results produced by the various prediction methods consist of the annotations of matching sequences and/or motifs, which are free-format texts written by humans and therefore may describe the same concept with synonymous words. It was considered desirable to present the results in such a way that the annotations describing the same biological function are grouped together so that the user does not need to read through all of them. To this end we devised an algorithm that enables the hierarchical clustering of free-format documents based on the similarity of their contents. This poster presents an enhanced version of our previously published method [1].
منابع مشابه
مقایسه نتایج خوشهبندی سلسله مراتبی و غیرسلسله مراتبی پروتئینهای مرتبط با سرطانهای مری، معده و کلون براساس تشابهات تفسیر هستیشناسی ژنی
Background and Objective: Using proteomic methodologies and advent of high-throughput (HTP) investigation of proteins has created a need for new approaches in bioinformatics analysis of experimental results. Cluster analysis is a suitable statistical procedure that can be useful for analyzing these data sets. Materials and Methods: In this research study, the identified proteins associated wi...
متن کاملHigh-throughput functional annotation of novel gene products using document clustering.
Gene products differentially expressed in healthy vs. diseased tissues may be considered drug targets since the change in their expression level can be related to the cause and progression of the disease studied. A significant portion of the proteins produced by these genes will be unknown and consequently their function must be characterised. The experimental elucidation of biochemical functio...
متن کاملFunctional gene clustering via gene annotation sentences, MeSH and GO keywords from biomedical literature
Gene function annotation remains a key challenge in modern biology. This is especially true for high-throughput techniques such as gene expression experiments. Vital information about genes is available electronically from biomedical literature in the form of full texts and abstracts. In addition, various publicly available databases (such as GenBank, Gene Ontology and Entrez) provide access to...
متن کاملStatistically rigorous automated protein annotation
MOTIVATION Assignment of putative protein functional annotation by comparative analysis using pre-defined experimental annotations is performed routinely by molecular biologists. The number and statistical significance of these assignments remains a challenge in this era of high-throughput proteomics. A combined statistical method that enables robust, automated protein annotation by reliably ex...
متن کاملGraph-based sequence annotation using a data integration approach
The automated annotation of data from high throughput sequencing and genomics experiments is a significant challenge for bioinformatics. Most current approaches rely on sequential pipelines of gene finding and gene function prediction methods that annotate a gene with information from different reference data sources. Each function prediction method contributes evidence supporting a functional ...
متن کامل